Program components having multiple selectable implementations

ABSTRACT

Libraries and individual program components are provided with a common interface and a number of alternative implementations (e.g. hash table, tree, compressed) which can be selected. The component is instrumented to measure a cost of each of its alternative implementations, both independent of and in the context of the interaction of that component with other components of the computer program. Based on measured cost, the desired implementation is chosen for the component by an external controller that is generic to the computer program or by an application program that interfaces with the library or component.

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application claims the benefit of U.S. ProvisionalApplication No. 60/267,573, filed Feb. 9, 2001.

[0002] The present application is related to the present inventors'applications entitled “Minimizing Interaction Costs Among Components OfComputer Programs” Ser. No. ______ (IBM Docket YOR920020022), and“Characterization Of Objects Of A Computer Program While Running Same”Ser. No. ______ (IBM Docket YOR920020050), which were filed on the sameday as the present application. These related application areincorporated herein by reference.

FIELD OF THE INVENTION

[0003] This invention generally relates to the field of optimization ofcomputer programs, and more particularly relates to a computer systemthat transforms programs so that they run more quickly, but produce thesame results. More particularly, the present invention relates to theuse of components, which have a number of selectable implementations, tohelp achieve such optimization.

PRIOR ART

[0004] Languages like SETL (see (1) Schwartz J, “Automatic DataStructure Choice in a Language of Very High Level” CACM December 1975,pp 722-728., (2) Schwartz J, “Optimization of Very High Level Languages,Parts I, II” J. Of Comp. Lang, 1975 pp. 161-218) and VERS (J. Earley.High level iterators and a method for automatically designing datastructure representation. J. of Computer Languages, 1976 also proposealternative implementations for basic data types such as sets. SETL usestype-based static analysis to select the best implementation for a giveninstance of the set representation. Our solution was to model theproblem as a graph problem and use dynamic profile information tocompute edge weights and to characterize properties that aredistinguishable from one run to another run. Both SETL and VERS relypurely on static information which probably will give them worseperformance. It is believed that a need exists to write efficientlibraries and other components.

[0005] For consistency of definition in the context of the presentapplication, it should be understood that the term “property”, withrespect to an object or component of a computer program, is broad, andincludes narrower terms such as “location”, “parameter”, and“implementation ”. In turn, “implementation” includes “datarepresentation” such as “string representation” (e.g. ASCII, EBCDIC,UNICODE) and “data structure” (e.g. hash, tree, compressed). Thus, itwill be understood that “implementation” does not encompass “location”,nor “parameter” within its meaning. Moreover, in the context of thepresent invention, “object”, “entity”, and “component” shall beinterpreted as having substantially the same meaning, while “library”shall be understood to mean a group of object definitions or componentdefinitions.

SUMMARY OF THE INVENTION

[0006] Broadly, the present invention provides a method of minimizingthe cost of using a component of a computer program, the aforesaidmethod comprising the steps of;

[0007] a) providing the aforesaid component with a plurality of explicitselectable alternative implementations which share a common componentinterface and semantics;

[0008] b) instrumenting the aforesaid component to gather cost-relatedinformation during at least a partial run of the aforesaid program;

[0009] c) providing the component with a cost estimator for using theaforesaid cost-related information to estimate a cost for using each ofthe explicit selectable alternative implementations in running theprogram;

[0010] d) based on the costs estimated in step (c), selecting one of theaforesaid explicitly selectable implementations for a subsequent atleast partial run of the program.

[0011] Preferably, a default implementation is used during said at leastpartial run.

[0012] According to a preferred embodiment, selecting step (b) iscarried out by an other program component which is operable as acontroller for the program. According to another alternative, theselection may be carried out by an application program which interfaceswith the aforesaid component.

[0013] The invention also provides a computer readable medium readableby a computer and including computer instructions for carrying out themethods disclosed hereinabove.

[0014] The invention also provides a computer program comprising aplurality of components which interact during running thereof, at leastone of the aforesaid components comprising:

[0015] a) a plurality of explicit selectable alternativeimplementations;

[0016] b) a common interface and semantics for receiving messages froman other one of the aforesaid components and sending messages thereto;

[0017] c) an instrumentation for gathering cost-related informationduring at least a partial run of the computer program;

[0018] d) a cost estimator for using the cost-related information toestimate a cost for using each of the explicit selectable alternativeimplementations; and

[0019] e) a selector for choosing one of the alternative implementationsin response to a message received at the aforesaid interface from one ofan application program and an other one of the program components.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

[0020] Writing efficient libraries or generic components intended tointeract with other components is often difficult or impossible becausethe library and component writers do not understand the runtime contextin which the library or component will be used. Moreover, the user of alibrary or component should not need to understand the internal workingsof the component in order to use it efficiently or to decide exactlywhich of a number of possible alternative candidate components tochoose, and in fact sometimes for the purpose of protecting intellectualproperty the user is not allowed to know the internal workings.Therefore, the user of a component cannot know if the component isappropriate in their context. Use of inappropriate components is a majorreason for inefficiency of modern software.

[0021] According to another preferred embodiment of the presentinvention, a computer system and method allows a component writer tospecify multiple algorithms and data structures for their implementationand to have the run time environment determine which one to choose,using information about the context in which the component will be used.The component writer will not only provide implementations but will alsoinclude instrumentation. The instrumentation will be run along with theimplementations—at least some of the time. Based on the informationgathered by the instrumentation, the appropriate implementation will bechosen.

[0022] The preferred embodiment for this system and method uses thewell-known Hyper/J system. Hyper/J allows for the composition of two ormore programs written in Java. In particular it allows taking twodescriptions of a class and puts the methods described in bothdescriptions together in a variety of ways. We propose a new compositionrule in the Hyper/J sense. A simple example of a known composition ruleis that if both descriptions include a definition of a particular methodthe composed description will in fact run both methods when in theprogram the method name is invoked on an instance of that class. We willcall this composition rule “merge”.

[0023] In our composition rule the programmer supplies instrumentationmethods, two or more implementations for the methods of the class, and aformula for each implementation. There is a list of implementations, andthe first one is somewhat arbitrarily chosen as the preferredimplementation. On the first run of the program the composed class willrun as though it were simply a composition of the instrumentationmethods and the preferred implementation using the merge compositionrule. The methods of the instrumentation would set some classattributes, which would be placed in persistent storage at the end ofthe program.

[0024] At the start of each subsequent run of the program, the values ofthe attributes are used in the formulas and the formula produces values.The formula will compute an approximation to the cost of thecorresponding implementation would have on this run. Thus, theimplementation with the smallest value would have been the mostefficient.

[0025] That implementation will be the one that should be used nexttime. The choosing of the implementation might be done by the JVM, whichwould have been modified to understand about classes composed using thiscomposition rule. A more complex implementation might avoid using amodified JVM, but rather when the first object of a class is created,the composed class could be constructed in such a way that the formulais evaluated and the implementation chosen.

[0026] A more sophisticated technique would involve modifying each newstatement in the user program that creates an instance of the composedclass. For each new statement, a subclass of the composed class would becreated. Each subclass would have the same code, but would store theresults of their instrumentation in different class objects. This wouldhave to be done carefully and perhaps could not be done in every case,because if the user has written code to modify the class object'sattributes, then that code must refer to the same attributes for all theinstances of the original class, not just the ones allocated at aparticular new statement.

[0027] A possible solution is to have the user code modify theattributes of the super class, which corresponds to the user's sourceclass, rather than modifying the attributes of the inserted classvariables. The advantage of separating the class into different almostidentical classes one for each new statement is that if differentimplementations are appropriate for different objects in the runningprogram, this separation at least allows that some of the time. Inparticular, when all or most objects created at a particular point inthe program should have the same implementation, then this mechanismwould allow that.

EXAMPLE Multiple Implementations of a Library or Component

[0028] This section will show by example how a component or librarywriter can provide multiple implementations of their component. Theywill be responsible not only for providing multiple implementations butthey will also be responsible for providing code that determines thecost of using those implementations. The cost comes in two flavors 1)the cost incurred by an object when using the implementation by itselfand 2) the cost of the interactions of two objects when the objects havedifferent implementations. At run time the system will be responsiblefor using the information gathered during the profiling runs from thecosts functions to choose good implementations.

[0029] While in some cases the decision about the appropriate choice ofimplementation of an object can be made by examining the kind of usethat is made of it, in other cases this decision would have to becoordinated with the corresponding decisions made for other objects thatinteract with this object. The later happens when the costs forinteractions are relatively high. For example, the decision about theplacement of an object in a network of computers is affected by thedecision about the placement of other objects that interact with thisobject. The problem thus becomes a combinatorial one, which even anexperienced programmer cannot make well without the kind of analysis wediscuss in this.

[0030] We illustrate the concepts using a Set class of objects. Thebasic Set operations are defined in the outer Set class. The innernested classes provide multiple implementations for the operationsdefined in the outer of Set class. Initially mySet will be set to adefault implementation for the profiling run, and can be reset to adifferent implementation in later runs using our analysis frameworkdescribed in this. Set is used as an illustrative example. The choice ofimplementations of sets often is not fundamentally dependent on the waythat set interacts with other sets. As we alluded to earlier, in otheroptimizations such as distribution of objects the interaction is thedominant issue. We chose sets because they illustrate the conceptssimply.

[0031] In order to relate the following code lines with elements of thepresent invention that have been summarized hereinabove, comments inUPPER CASE will be inserted. class Set { private Object mySet = null;Private int size;

[0032] THE EXISTENCE OF A COMMON COMPONENT INTERFACE AND SEMANTICS NOWBEGINS TO BE EXPRESSED. Set(String implementation){ // mySet is set tothe appropriate implementation based on the input String implementation.If (implemenation == “default” OR implementation == “TreeSet”) mySet =new TreeSet(); Else if (implementation == “HashSet”) MySet == newHashSet(); ...etc. } void add(Element e) { mySet.add(e); } // returnsuccessor element of e Element successor(Element e) { returnmySet.sucessor(e) ; } void remove(Element e) { mySet.remove(e) ; } Setcopy(String implementation) { // Return a copy of this set which hasimplementation determined by the // input string implementation ............................... } Int size{ return mySet.size();} Setunion(Set s, String Implementation) { // Return a set which is the unionof this set and s and has the implementation determined my the stringimplementation. ..................... }

[0033] EXPRESSION OF THE COMMON COMPONENT INTERFACE IS NOW COMPLETE.

[0034] OPERABILITY OF THE COMPONENT IN A PLURALITY OF IMPLEMENTATIONSNOW BEGINS TO BE EXPRESSED.

[0035] class TreeSet{//tree implementation

[0036] TreeSet( ){ . . . };

[0037] //implement the perations defined in Set.

[0038] void add(Element e) { . . . }

[0039] Element successor(Element e) { . . . }

[0040] void remove(Element e) { . . . }

[0041] //implement methods to compute

[0042] }

[0043] THE NEXT 4 LINES, LATER REPEATED FOR OTHER ALTERNATIVEIMPLEMENTATIONS, EXPRESS THE INSTRUMENTATION OF THE COMPONENT TO MEASURECOST IN A TREE IMPLEMENTATION.

[0044] // cost functions for TreeSet

[0045] int treeSetCostAdd( ){return log sizeo;}

[0046] int treeSetCostSucessoro( ){ . . . }

[0047] int treeSetCostRemove( ){. . . }

[0048] class HashSet {// hash implementation

[0049] HashSet( ){ . . . }

[0050] // implement the operations defined in Set

[0051] void add(Element e){ . . . };

[0052] . . .

[0053] }

[0054] // define the corresponding cost functions for HashSet

[0055] int hashSetcostAdd( ){ }

[0056] . . .

[0057] class CompSet {//compressed set

[0058] CompSet( ){ . . . }

[0059] // implement the operations defined in Set.

[0060] . . .

[0061] }

[0062] // implement cost functions for CompSet

[0063] . . .

[0064] EXPRESSION OF THE OPERABILITY OF THE COMPONENT IN A PLURALITY OFALTERNATIVE IMPLEMENTATIONS IS NOW COMPLETE.

[0065] // Next we define transformation functions

[0066] Set Tree2Hash(Set tree) {

[0067] // TreeSet to HashSet representation

[0068] }

[0069] Set Comp2Tree (Set comp) {

[0070] // CompSet to TreeSet representation

[0071] }

[0072] THE NEXT 4 LINES EXPRESS THE INSTRUMENTATION OF THE COMPONENT TOMEASURE COST BASED ON MINIMIZATION OF INTERACTION WITH OTHER COMPONENTS

[0073] // define a single cost functions for

[0074] // all the above transformations

[0075] int costTransformation( ){ . . . }

[0076] }

[0077] The Set class consists of four parts: (1) The basic operationsdefined in the outer class, (2) Different implementations for the basicoperations defined in the inner classes, (3) Transformation functions totransform one implementation to another, and (4) cost functions definedfor each implementation and cost function for transformations. In thisexample we will assume that the cost of transforming between twodifferent implementations is the same regardless of whichimplementations, This approximation simplifies the decision problem andavoids requiring n² cost functions. In many cases the simplification issufficiently accurate.

[0078] The implementor is responsible for providing cost functions thatreturn the cost of each call in the different implementations, so thecost of the successor query for TreeSet should return log of the size ofthe tree. If two variables contain sets, there is a cost of cloning fromone to the other, namely the cost of the copy. In our model the user whoinvokes the copy operation should not know the underlyingrepresentation. So, as part of a copy operation the system will invokethe correct transfer function, say tree2hashset in addition to makingthe copy. There is a cost of doing that additional conversion and inaddition to writing the tree2hashset function the author of the multipleimplementations must provide the coercioncost function so the system canunderstand the penalty of using different implementations.

[0079] A client that uses the Set class will simply create instances ofobjects of the Set class. Without our analysis the client will get thedefault implementation (TreeSet in our case). The implementor of the setclass in this case chose TreeSet because it had the best worst-caseperformance and so was likely to run acceptably during the profile run.Using our framework a runtime system or a compiler or the applicationprogram will act as a controller to automatically ensure that theclient, after the profile run, gets the best of the set implementationsdefined by the nested classes. In the following example we will expresscode for a simple client (i.e. application program) that wants to usethe above Set. The process of selection of component implementation(controlling) is done in the application code. Class Implementation{Public static String s1, s2,....,s5; } class Client { public staticmain(String s[]) { // Input from the user whether this is a monitoringrun // If this is a monitoring run, set up the system to do themonitoring // else just read the required implementation from storage If(!monitoring_run) {

[0080] THE FOLLOWING LINES EXPRESS THE SELECTION, IN ACCORDANCE WITHTHIS ILLUSTRATIVE EMBODIMENT, OF AN ALTERNATIVE IMPLEMENTATION BASED ONCOST INFORMATION WHICH IS IN STORAGE FROM A PRIOR MONITORING RUN.Read_from—storage(Implimentation.s1,....,Implementation.s5); } Else //prepare for the monitoring run { // First set the implementations to bedefault. Implementation.s1 = “default”; ..... Implementation s5 =“default”; // Build the ObjectAffinityGraph with the nodes for each setObjectAffinityGraph oag = new ObjectAffinityGraph(s1,s2,s3,s4,s5); } Sets1 = new Set(Implementation.s1) ; // add a bunch of elements // to s1instance // if this is a monitoring run, then add the cost of theoperations to the // appropriate place in the oag. If(monitoring_run) {// for each add operation on s1 add weight determined by the costOag.addEdgeWeight(s1, TreeSet, -s1.treeSetCostAdd());Oag.addEdgeWeight(s1,HashSet, -s1.hashSetCostAdd()); //etc. ... Set s2 =s1.copy(Implementation.s2) ; If(monitoring_run) { //add the cost ofinteraction between s1 and s2.Oag.addEdgeWeight(s1,s2,s1.costTransformation()); } // perform a bunchof successor // queries such as // find all elements that // fallbetween two values // if it is a monitoring run, then add theappropriate costs to oag. ... Set s3 = new Set(Implementation.s3) ; //add a bunch of elements // if it is a monitoring run, add theappropriate costs to oag ... Set s4 = s3.copy(Implementation.s4) ; //ifit is a monitoring run, add the interaction cost between s3 and s4 inoag // burn s4 into a ROM burnROM(s4) ; // if it is a monitoring run,add .... Set s5 = s4.union(s2, Implementation.s5);// s5 is a new set//If this is a monitoring run add appropriate edge weight between s4 ands5 and // also between s2 and s5. // perform a bunch of successorqueries //Finally if this is a monitoring run then // Perform the graphcutting to determine a good (and if possible, the best) implementationfor each of the data structures s1,...,s5 and store this for future use.... } }

[0081] The Client class creates a number of set instances and performsdifferent kinds of operations on different instances. Using the analysispresented in “Minimizing Interaction Costs Among Components Of ComputerPrograms” Ser. No. ______ (IBM Docket YOR920020022), a runtime system ora compiler or the application program (the controller) can assign theright implementation for each of the set instances s1, s2, s3, s4, ands5. In this example, we show explicitly how the application program canselect the appropriate component implementation. During a profile(monitoring) run of the client code information is gathered. During thatrun the default implementation (i.e., TreeSet) will be used. Theapplication program (controller) invokes the cost methods for eachimplementation to compute the cost of using an implementation for a setinstance. This cost includes the costs for invoking various operationsthat the client code (i.e. The application program) invokes. Forinstance, s1 invokes only add( ) operations, whereas s2 invokesoperations to perform successor queries (not indicated in the Setclass). Notice that s2 is a copy of s1, and so if s1 and s2 usedifferent representation there is the cost of transforming s1implementation to s2 implementation. The controller uses the costinformation to create and Object Affinity Graph and find the desirableimplementation using the techniques given in “Minimizing InteractionCosts Among Components Of Computer Programs” Ser. No. ______ (IBM DocketYOR920020022). The controller would store this result in a storage sothat it can be used for subsequent (production) runs so as to achievelower costs.

[0082] The preferred embodiments of the present invention can berealized in hardware, software, or a combination of hardware andsoftware. A controller, according to the preferred embodiments of thepresent invention, can be realized in a centralized fashion in onecomputer system, or in a distributed fashion where different elementsare spread across several interconnected computer systems. Any kind ofcomputer system—or other apparatus adapted for carrying out the methodsdescribed herein—is suited. A typical combination of hardware andsoftware could be a general purpose computer system with a computerprogram that, when being loaded and executed, controls the computersystem such that it carries out the methods described herein.

[0083] The present invention can also be embedded in a computer programproduct, which comprises all the features enabling the implementation ofthe methods described herein, and which—when loaded in a computersystem—is able to carry out these methods. Computer program means orcomputer program in the present context mean any expression, in anylanguage, code or notation, of a set of instructions intended to cause asystem having an information processing capability to perform aparticular function either directly or after either or both of thefollowing a) conversion to another language, code or, notation; and b)reproduction in a different material form.

[0084] Each computer system may include, inter alia, one or morecomputers and at least a computer readable medium allowing a computer toread data, instructions, messages or message packets, and other computerreadable information from the computer readable medium. The computerreadable medium may include nonvolatile memory, such as ROM, Flashmemory, Disk drive memory, CD-ROM, and other permanent storage.Additionally, a computer medium may include, for example, volatilestorage such as RAM, buffers, cache memory, and network circuits.Furthermore, the computer readable medium may include computer readableinformation in a transitory state medium such as a network link and/or anetwork interface, including a wired network or a wireless network, thatallow a computer to read such computer readable information.

[0085] Although specific embodiments of the invention have beendisclosed, those having ordinary skill in the art will understand thatchanges can be made to the specific embodiments without departing fromthe spirit and scope of the invention. The scope of the invention is notto be restricted, therefore, to the specific embodiments, and it isintended that the appended claims cover any and all such applications,modifications, and embodiments within the scope of the presentinvention.

What is claimed is:
 1. A method of minimizing the cost of using acomponent of a computer program, said method comprising the steps of; a)providing said component with a plurality of explicit selectablealternative implementations which share a common component interface andsemantics; b) instrumenting said component to gather cost-relatedinformation during at least a partial run of said program; c) providingsaid component with a cost estimator for using said cost-relatedinformation to estimate a cost for using each of said explicitlyselectable implementations in running said program; d) based on thecosts estimated in step (c), selecting one of said explicitly selectableimplementations for a subsequent at least partial run of said program.2. A method as set forth in claim 1, wherein a default implementation isused during said at least partial run.
 3. A method as set forth in claim1, wherein selecting step (d) is carried out by an other componentoperable as a controller.
 4. A method as set forth in claim 1, whereinselecting step (d) is carried out by an application program.
 5. Acomputer readable medium readable by a computer and including computerinstructions for carrying out a method of minimizing the cost of using acomponent of a computer program, said method comprising the steps of; a)providing said component with a plurality of explicit selectablealternative implementations which share a common component interface andsemantics; b) instrumenting said component to gather cost-relatedinformation during at least a partial run of said program; c) providingsaid component with a cost estimator for using said cost-relatedinformation to estimate a cost for using each of said explicitlyselectable implementations in running said program; d) based on thecosts estimated in step (c), selecting one of said explicitly selectableimplementations for a subsequent at least partial run of said program.6. A computer readable medium as set forth in claim 5, wherein a defaultimplementation is used during said at least partial run.
 7. A computerreadable medium as set forth in claim 5, wherein selecting step (d) iscarried out by an other component operable as a controller.
 8. Acomputer readable medium as set forth in claim 5, wherein selecting step(d) is carried out by an application program.
 9. A computer programcomprising a plurality of components which interact during runningthereof, at least one of said components comprising: a) a plurality ofexplicit selectable alternative implementations; b) a common interfaceand semantics for receiving messages from an other one of saidcomponents and sending messages thereto; c) an instrumentation forgathering cost-related information during at least a partial run of saidcomputer program; d) a cost estimator for using said cost-relatedinformation to estimate a cost for using each of said explicitselectable alternative implementations; and e) a selector for choosingone of said alternative implementations in response to a messagereceived at said interface from one of an application program and another one of said components.
 10. A computer program as set forth inclaim 9, wherein one of said alternative implementations comprises adefault implementation which is operational before operation of saidselector.
 11. A computer program as set forth in claim 9, said selectorbeing operable to choose an alternative implementation based upon a costmeasurement by said instrumentation.